Sentence alignment in bilingual corpora based on crosslingual querying

نویسندگان

  • Frédérique Bisson
  • Christian Fluhr
چکیده

The effectiveness of translation memory for computer-aided translation depends on the results of previous sentence alignment. This paper describes a new approach to sentence alignment, based on a crosslingual querying using the technology of an existing product, SPIRIT (Syntactic and Probabilistic Indexing and Retrieval of Information in Texts). Sentence alignment and crosslingual querying based on bilingual reformulation are similar problems: both are based on a semantic proximity between two texts in different languages; both aim to find the sentences that contain most of the information demanded by the query. However, sentence alignment requires the irrelevant part of a sentence to be as short as possible. Crosslingual querying provides sentence alignment with candidates. ARCADE evaluation has shown that this approach is very robust in the cases of inverted sentence order and missing segments .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence Alignment in Parallel, Comparable, and Quasi-comparable Corpora

We explore the usability of different bilingual corpora for the purpose of multilingual and cross-lingual natural language processing. The usability of bilingual corpus is evaluated by the lexical alignment score calculated for the bi-lexicon pair distributed in the aligned bilingual sentence pairs. We compare and contrast a number of bilingual corpora, ranging from parallel, to comparable, and...

متن کامل

Bilingual Lexicon Construction Using Large Corpora

This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building bilingual lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of ltered linguistic feedback between term ...

متن کامل

Sentence Alignment of Historical Classics based on Mode Prediction and Term Translation Pairs

Parallel corpora are essential resources for the construction of bilingual term dictionary of historical classics. To obtain large-scale parallel corpora, this paper proposes a sentence alignment method based on mode prediction and term translation pairs. On one hand, the method rebuilds the sentence alignment process according to characteristics of the translation of historical classics, and a...

متن کامل

Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity

Sentence alignment plays an essential role in building bilingual corpora which are valuable resources for many applications like statistical machine translation. In various approaches of sentence alignment, length-and-word-based methods which are based on sentence length and word correspondences have been shown to be the most effective. Nevertheless a drawback of using bilingual dictionaries tr...

متن کامل

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Sentence-level aligning bilingual parallel corpus is shown significant and indispensable status in machine translation, translation knowledge acquiring and bilingual lexicography research fields, which is the fundamental work for natural language processing. Given the great deal of work in sentence alignment and a variety of methods have developed for bilingual terminology extraction, those are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000